Search CORE

8 research outputs found

Towards Label-free Scene Understanding by Vision Foundation Models

Author: Chen Nenglun
Chen Runnan
Kong Lingdong
Liu Tongliang
Liu Youquan
Ma Yuexin
Wang Wenping
Zhu Xinge
Publication venue
Publication date: 06/06/2023
Field of study

Vision foundation models such as Contrastive Vision-Language Pre-training (CLIP) and Segment Anything (SAM) have demonstrated impressive zero-shot performance on image classification and segmentation tasks. However, the incorporation of CLIP and SAM for label-free scene understanding has yet to be explored. In this paper, we investigate the potential of vision foundation models in enabling networks to comprehend 2D and 3D worlds without labelled data. The primary challenge lies in effectively supervising networks under extremely noisy pseudo labels, which are generated by CLIP and further exacerbated during the propagation from the 2D to the 3D domain. To tackle these challenges, we propose a novel Cross-modality Noisy Supervision (CNS) method that leverages the strengths of CLIP and SAM to supervise 2D and 3D networks simultaneously. In particular, we introduce a prediction consistency regularization to co-train 2D and 3D networks, then further impose the networks' latent space consistency using the SAM's robust feature representation. Experiments conducted on diverse indoor and outdoor datasets demonstrate the superior performance of our method in understanding 2D and 3D open environments. Our 2D and 3D network achieves label-free semantic segmentation with 28.4% and 33.5% mIoU on ScanNet, improving 4.7% and 7.9%, respectively. And for nuScenes dataset, our performance is 26.8% with an improvement of 6%. Code will be released (https://github.com/runnanchen/Label-Free-Scene-Understanding)

arXiv.org e-Print Archive

Vid2Curve: Simultaneous Camera Motion Estimation and Thin Structure Reconstruction from an RGB Video

Author: Chen Nenglun
Chu Hung-Kuo
Liu Lingjie
Theobalt Christian
Wang Peng
Wang Wenping
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2020
Field of study

Thin structures, such as wire-frame sculptures, fences, cables, power lines, and tree branches, are common in the real world. It is extremely challenging to acquire their 3D digital models using traditional image-based or depth-based reconstruction methods because thin structures often lack distinct point features and have severe self-occlusion. We propose the first approach that simultaneously estimates camera motion and reconstructs the geometry of complex 3D thin structures in high quality from a color video captured by a handheld camera. Specifically, we present a new curve-based approach to estimate accurate camera poses by establishing correspondences between featureless thin objects in the foreground in consecutive video frames, without requiring visual texture in the background scene to lock on. Enabled by this effective curve-based camera pose estimation strategy, we develop an iterative optimization method with tailored measures on geometry, topology as well as self-occlusion handling for reconstructing 3D thin structures. Extensive validations on a variety of thin structures show that our method achieves accurate camera pose estimation and faithful reconstruction of 3D thin structures with complex shape and topology at a level that has not been attained by other existing reconstruction methods.Comment: Accepted by SIGGRAPH 202

arXiv.org e-Print Archive

MPG.PuRe

Vid2Curve

Author: Aroudj Samir
Bai Xue
Chen Tao
Christian Theobalt
Dai Angela
Fabbri R.
Fan Xinyi
Hsiao Kai-Wen
Huang Hui
Huang Po-Han
Hung-Kuo Chu
Jain Arjun
Kroeger Till
Li Guo
Li Shiwei
Lingjie Liu
Liu Lingjie
Liu Lingjie
Nenglun Chen
Newcombe Richard A.
Nurutdinova Irina
Peng Wang
Pottmann Helmut
Savinov Nikolay
Schönberger Johannes Lutz
Tabb A.
Triggs Bill
Usumezbas Anil
Wenping Wang
Wu Shihao
Yan Feilong
Yücer K.
Yücer Kaan
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref